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ABSTRACT 


The  usefulness  of  an  extra  sum  of  squares  statistic  Qg  fo r  detecting 

in  pr^'"'s 
1  previously  in 


K  outliers  has  been  discussed  previously  in  the  context  of  two-way  tables. 
(See  Gentleman  and  Wilk,  1975a,  1975b;  John  and  Draper,  1978;  and  Draper  and 


John,  1980.)  ^That  work  is  extended  here  to  straight  line  regression 
situations  arising  from  and  motivated  by  a  specific  set  of  research  data. 
Percentage  points  for  the  appropriate  test  statistics  are  obtained  by 

simulation,  approximations  ''Jr  these  percentage  points  are  suggested,  and 

r  ‘  e  . 

power  calculations  are  made  for  various  designs  and  outlier  situations. 

^  • 

Correct  determination  of  K  and  position(s)  of  the  out lier (s)  appear  to  be 
important  in  influencing  power. 
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SIGNIFICANCE  AND  EXPLANATION 


Previous  work  in  two-way  tables  on  the  use  of  an  extra  sue  of  squares 
statistic  Qg  to  detect  outliers  is  extended  to  the  straiqht  line  regression 
situation,  Motivated  by  some  specific  research  data.  Percentage  points  for 


the  appropriate  test  statistics  are  obtained  via  simulation,  and  approx¬ 
imations  for  these  percentage  points  are  suggested.  Power  calculations,  made 
for  various  derived  designs  and  outliers  situations,  show  the  effects  of  the 
choice  of  X,  the  number  of  potential  outliers  assumed,  and  the  positions  of 
the  outliers  in  the  predictor  variable  space. 
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PERCENTAGE  POINTS  AND  POWER  CALCULATIONS  FOR 
OUTLIER  TESTS  IN  A  REGRESSION  SITUATION 

Camll  Fuchs*  and  Norman  R.  Draper** 

1 .  INTRODUCTION  AND  NOTATION 

* 

In  prior  work  by  Gentleman  and  Wllk  (1975a,  1975b),  by  John  and  Draper 
(1978),  and  by  Draper  and  John  (1980),  the  utility  of  the  QK  statistic  for 
checking  outliers  In  data  used  to  fit  a  linear  model  was  explored.  If 
one  or  more  observations  (K  In  general)  are  suspect,  extra  dummy  variables 
e  can  be  Inserted  in  the  model  to  represent  the  discrepancies,  and  the  QK 
statistic  is  simply  the  extra  sum  of  squares  due  to  the  estimates  of  the 
parameters  associated  with  the  dummy  variables.  Specifically,  if  our  original 
model  Is  y  *  XB  +  e,  we  can  write 


where  y  «  (yj,y£)'  is  an  n  x  1  vector  uf  response  observations,  X  «  (Xj.Xp' 

is  an  n  x  p  matrix  of  predictor  variable  values,  Ms  a  p  x  ]  vector  of  model 

parameters,  6  Is  a  K  x  1  vector  of  additional  parameters  and  e  is  an  n  *  1 

2 

vector  of  random  errors  distributed  c  ~  N(0,lo  ).  The  positioning  of  the  K 
potential  outliers  as  the  last  elements  of  y  is  for  convention  only. 
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If,  a  priori,  the  nunter  K  and  the  locations  implied  by  choice  of 

2  2 

X2  are  specified,  and  if  s  is  an  unbiased  estimate  of  a  independent  of  QK, 
then 

Fk  =  (Qk/K}/s2  (1.2) 

t 

has  a  non-central  F  distribution  with  noncentrality  parameter 

=  e'(i-H„)e/2o2,  (1.3) 

I 

where  H„  is  the  K  x  k  lower  right  part  of  H  =  XfX'X)"^'  when  H  is 

%*«*'»*  *■»*  *»* 

partitioned  as 

H  =  Hn  H12  =  X1(X*X)"1X^  .  (1.4) 

H21  H22  X2(X'X)-1X^  XgU’X)'^ 

(See  Cook,  1979;  Ellenberg,  1976.)  H  =  ((h^))  *s  sometimes  called  the  hat 

A 

matrix,  because  y  *  Hy  so  the  h  values  give  relative  contributions  ("leverage") 

of  the  corresponding  observations  to  the  fitted  value  at  each  point.  Note 

that.  In  general,  trace  H  *  p,  the  number  of  parameters  in  the  model.  For 

2  2 

K  *  1,  the  parameter  of  non-centrality  Is  X ^  s  ®n^"hnn^2°  ’ 

In  most  practical  situations,  the  F  distribution  is  not  appropriate 
because  the  nunber  and  the  locations  of  the  outliers  have  to  be  elucidated 
from  the  data.  In  that  case,  the  quadratic  form  QK  has  to  be  calculated 
for  all  the  permutations  of  K  omissions.  The  largest  QK  over  all  the  per¬ 
mutations  of  K  omissions  is  denoted  by  Q^. 
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Because 


"k 


(1.5) 


where 


I  -2 


(1.6) 


with  the  obvious  split  of  the  residual  vector  e  Into  the  first  (n-K)  and 
last  K  elements.  It  Is  clear  that  Qlm  corresponds  to  the  square  of  the 
largest  standardized  residual  In  modulus;  In  fact  the  1th  (1*1 ,2,.. .  ,n)  Q] 
value  Is 


Qlt  ■  ej/d-h^).  (1.7) 

When  K  *  2,  an  Interesting  subtlety  arises.  We  can  always  write 

r?  (rp))2 

,2"*^T  +  T^  • 


(1.8) 


V^V?Ar 


* ■*.  ■ 
*.  ,'*.  /.  -•.>* 
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In  this  expression,  denotes  one  of  the  residuals  e^  whose  location  pro¬ 
vides  the  largest  Q2  and  r^  denotes  the  residual  that  would  occur  In  the 
second  location  that  provides  the  largest  Q2,  If  the  observation  In  the 
first  location  were  dropped  from  the  data.  (Note  that  we  can  also  write 

{r<2)}2  r2 

(,-9) 

since  the  relationship  Is  perfectly  symmetrical.)  Now,  If  r./(l-h,,)^2 
Is.  the  largest  In  modulus  standardized  residual,  then  r^/O-h^)^2  Is 
the  largest  In  modulus  (adjusted)  standardized  residual  obtained  after  re¬ 
moval  of  the  observation  corresponding  to  r^.  (Similar  remarks  apply  with 
subscripts  1  and  2  reversed.)  However,  neither  r^/d-h^)^2  nor  r2/(l-h22 
need  be  the  largest  In  modulus  standardized  residual!  In  our  subsequent 
simulations,  we  nevertheless  assume  that  (me  of  them  Is,  and  so  compute 
Q2m  In  a  "stepwise"  fashion.  This  greatly  simplifies  the  simulation  pro¬ 
cedure  for  K  »  2,  and  Is  an  adequate  approximation  In  view  of  the  fact  that 
the  correct  and  approximate  simulation  results  appear  to  be  Identical  well 
over  99%  of  the  time.  In  practice.  (To  test  this,  we  performed  3000 
simulations  calculating  Q2m  both  directly  and  In  stepwise  fashion.  The 
results  were  Identical  In  all  except  8  cases.)  Similar  considerations 
would  apply  for  K  >.  3. 

The  and  the  corresponding  residual  mean  square  sj^  provide  the 
test  statistic 


1/2 
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FKm  *  fVK>'sL-  '  C.10) 

The  distribution  of  F^  Is  non-standard  and  unknown.  In  previous  work,  per¬ 
centage  points  have  been  generated  for  various  specific  cases  of  two-way 
tables.  See  John  and  Oraper  (1978)  and  Draper  and  John  (1980)  both  for  de¬ 
tails,  and  for  useful  approximations  to  those  percentage  points  for  K  =  1,2,3. 

The  present  study  focuses  on  the  case  of  straight  line  regression  and 
extends  previous  investigations.  It  examines  critical  values  for  the  outlier 
test  based  on  F^,  approximations  to  those  critical  values,  and  also  the 
power  of  these  tests.  The  effects  of  various  design  configurations  and  of 
mis -specification  of  K  on  the  power  calculations  is  also  examined.  The 
study  was  triggered  by  analysis  of  data  from  an  experiment  on  the  relationship 
between  the  number  of  viable  cells  injected  into  the  intestine  of  host  rats 
and  the  number  of  y-glutamyl  transpeptidase  colonies  [GT+]  formed  in  the 
liver  lobes  of  those  animals.  The  Injected  cell  suspensions  were  prepared 
from  donor  rats  livers  subjected  to  a  standard  carginogen  diet  of  2- 
acetylaminofluorene  (AAF)  concurrent  with  a  two  third  hepatectomy  (PH). 

This  AAF/PH  regiment  has  been  used  extensively  in  recent  years  and  Is  clearly 
toxic  to  the  health  of  the  test  animals.  The  host  animals  were  also  subjected 
to  an  AAF/PH  regiment  and  were  sacrificed  10  days  post  the  PH  and  the  in¬ 
jection  of  the  cells  from  the  donor  animals.  This  experiment  is  from  an  on¬ 
going  research  study  of  the  mechanism  by  which  carcinogens  Induce  liver 
cancer  in  experimental  animals.  For  further  details  see  Laishes  and  Rolfe 
(1980). 

The  data,  shown  in  Table  1,  were  obtained  on  three  different  days  de¬ 
noted  as  A,  B  and  C,  with  13,  8,  and  5  survivor  host  rats  (not  cannibalized) 


Table  1.  Average  Counts  of  GT(+)  Colonies  in  Two  Standard  Liver,. 

Sections  and  the  Number  of  Viable  Cells  Injected  (x10~3) 
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respectlvely.  Laishes  and  Rolfe  (1980)  found  that  a  straight  line  re¬ 
gression  model  represents  adequately  the  association  between  x  *  the  number 
of  viable  liver  cells  Injected  Into  host  rats  and  y  *  the  number  of  GT(+) 
colonies  observed  on  day  ten  post  Infection  (R*«0.90).  When  the  test 
animals  were  au tops led.  It  was  observed  that  two  animals  from  Experiment  A 
were  obviously  afflicted  with  a  severe  cholestatic  disorder  (exhibiting  a 
yellowish,  jaundiced  1iver)and  one  animal  from  Experiment  C  had  received, 
through  technical  error,  an  Incomplete  PH  (revealed  by  the  presence  of  a 
portion  of  the  median  liver  lobe).  Thus,  despite  efforts  to  control  animal 
health,  a  small  percent  were  overly  diseased  at  the  time  of  the  sampling. 
The  question  aroused  in  this  experiment  Is  whether  regression  diagnostics 
are  able  to  detect  the  diseased  animals  as  outliers.  The  concern  Is  that. 

In  other  experiments,  disease  states  that  could  Influence  liver  colony 
developments  might  be  overlooked.  Additionally,  It  Is  also  desirable  to 
develop  tools  for  detecting  animals  afflicted  by  subclinlcal  disease  states, 
which  can  only  be  revealed  by  complete  pathological  and  microbiological 
work-ups. 

The  design  from  the  above  mentioned  experiment  served  as  an  Initial  de¬ 
sign  In  Monte-Carlo  simulations  Intended  to  Investigate  the  statistics  Fjm 
and  F^  under  various  conditions  in  the  linear  regression  model.  (A  brief 
analysis  of  the  original  data  Is  given  in  Section  5.) 


rV-VOMgt.mfr 


2.  DESIGNS  CHOSEN  FOR  SIMULATIONS 


Initially  we  looked  at  15  different  designs  comprising  various  numbers 
of  points  at  the  x-locatlons  of  the  original  data.  From  these  15  we  selected 
the  four  shown  In  Table  2  for  our  Investigation.  In  this  table,  the  letters 
x,  f  denote  the  sites  and  frequency  of  points  at  that  site,  respectively. 

The  h  values  are  the  corresponding  values  of  the  diagonal  of  the 
H  *  X(X'X)”^X'  matrix.  We  also  record  the  Eh?,  values  at  the  foot  of  each 

•»  m*  A*  -w  || 

2 

coluim.  Smaller  values  of  Eh^  denote  designs  "more  robust"  to  outliers, 

as  described  by  Box  and  Draper  (1975).  The  table  also  shows  the  locations  at 

which  one  outlier  will  be  added,  at  which  pairs  of  outliers  will  be  added  and 

the  corresponding  values  of  H22  and  |I-H22|.  H22  Is  the  part  of  the  H 

matrix  that  corresponds  to  the  two  sites  and  U-fial  Is  a  spatial  measure  of 

the  positions  of  these  sites,  lower  values  Indicating  more  "remoteness"  from 

the  rest  of  the  data.  (See  Draper  and  John,  1981.) 

(Note:  the  words  "cases"  and  "locations"  have  different  meanings. 

"Cases"  refer  to  specific  animals  in  Table  1.  "Locations"  refer  to  x-sltes 

counted  off  In  the  various  designs.  For  example,  location  15  Is  at  x  =  5 

In  Design  1,  x  =  7  in  Design  2,  x  *  15  In  Design  3,  and  x  =  0  in  Design  4.) 

The  four  designs  used  were  selected  to  achieve  a  representative  range 

of  the  characteristics  that  occurred,  within  limits.  For  example,  the 
2 

fifteen  Eh^  values  ranged  from  0.1537  to  0.4755  (apart  from  two  designs  with 
25  points  on  one  end  of  the  x- range  and  one  point  at  the  other,  for  which 


Table  2.  Designs  Used  for  Simulation  Studies  and  Locations  of 
Added  Outliers  . 


X 

Design  Number 

1 

2 

i 

3 

4 

f 

h 

f 

h 

f 

h 

f 

h 

0 

4 

8 

0.0809 

13 

0.0769 

20 

0.0457 

1 

6 

0.0667 

2 

0.0691 

0 

- 

1 

0.0394 

3 

3 

2 

0.0511 

0 

- 

1 

0.0443 

5 

6 

0.0394 

2 

0.0410 

0 

- 

1 

0.0725 

7 

2 

2 

0.0386 

0 

- 

1 

0.1239 

4 

0.1161 

2 

0.0496 

0 

- 

1 

0.2445 

15 

1 

0.3159 

8 

0.1067 

13 

0.0769 

1 

0.5617 

Zh* 

0.2308 

0.1695 

0.1537 

0.4412 

One 

outlier  at 
locations  5,25,26 

14,15 

1,14 

21,25 

wh<re 

x  ®  1,10,15 

5,7 

0,15 

1,10 

Pair  of 
outliers  at 

locations  (25,25)  (14,15)  (1,2)  (25,25) 


with 

.116  .185 

.041  .038 

.077  .077 

.039  .024 

~22  = 

.185  .316 

.038  .039 

.077  .077 

.024  .244 

and 

|I-H22|=  .570 


.846 


Pairs  of 
outliers  at 
locations  (5,25) 


921 


(1.14) 


.725 


took  the  uncharacteristic  value  1.04).  The  aim  was  to  enable  the  assess 
ment  of  the  effects  of  several  factors  on  the  empirical  power  of  the  tests, 
specifically  these: 

(a)  the  use  of  tests  based  on  Flm  and  on  F2m  both  in  the  presence  of 

i 

a  single  outlier  and  when  two  outliers  are  present. 

(b)  the  h-value  (and  the  appropriate  X 1  value)  corresponding  to  each 
outlier  site. 

(c)  the  values  of  the  elements  in  the  H22  matrix  corresponding  to  the 
two  outliers  sites. 

(d)  the  |I-H22|  corresponding  to  H22  from  (c). 

(e)  the  X2  value  (In  the  case  of  two  outliers). 

Obviously,  by  definition,  the  values  of  the  X's  are  functions  of  the  appro¬ 
priate  h  and  0  values;  see  Eq.  (1.3).  The  X's  are  used  here  as  general 
measures  of  departure  from  the  null  hypothesis  and  not  in  connection  with  any 
(inappropriate  in  the  present  context)  non-central  F-distribution. 


3.  CRITICAL  VALUES  FOR  IN  STRAIGHT  LINE  REGRESSION  MODELS. 


John  and  Draper  (1978)  and  Draper  and  John  (1980)  suggest  the  following 
sequential  strategy  for  the  detection  of  the  number  of  outliers  and  of  their 
location  In  a  two-way  ANOVA  table  with  one  observation  per  cell: 

(a)  Determine  K,  the  maximum  reasonable  number  of  outliers  In  the  data. 

(b)  Test  H^:  no  outliers  versus  H^:  there  are  1  or  2,  ...  or  K 
outliers, by  comparing  F^  with  the  appropriate  critical  value  of  the  null 
distribution  of  F^. 

(c)  If  Hq  is  rejected,  the  y  value  associated  with  the  standardized 
residual  with  maximum  modulus  value  Is  declared  an  outlier  and  deleted, 
and  we  return  to  (b)  with  K  replaced  by  K  -  1.  The  algorithm  stops 

when  Hq  Is  not  rejected.  For  K  *  1,  John  and  Draper  (1978)  suggest 
the  use  of  the  conservative  critical  value  derived  from  the  Bonferonni 

Inequality  F,  „(a/n),  where  F-  f  (c)  »  P(Ff  f  >c)  Is  the  upper  tall 
l  ,n-p-K  t1,t2  t1*t2 

of  a  central  F-varlate  with  (f-j  *fg)  ^e9rees  0 f  freedom.  For  K  >_Z  (actually, 
for  K  =  2,3)  Draper  and  John  (1980)  found  that  for  testing  Hq  at  a  specified 
level  a,  good  approximations  to  the  critical  values  are  obtained  by  setting: 


■  *  !>  O  - 


(3.1) 


In  Andrews*  (1971)  formula 


l  K“1 

K T  ('"‘^K.n-p-I^W  " 


m-K  /m  *\r  r  K  n-p-K+1  r* 

(K-l) !  'm~mK-l,n-p-K+lLK-l  n-p-K  rKm 

(3.2) 


which.  In  Its  original  context,  provided  a  bound  on  the  probability  of  ob¬ 
taining  K  extreme  residuals. 

Vie  now  proceed  to  assess  empirically  the  critical  values  for  the  tests 
based  on  Flffl  and  F2m  In  a  straight  line  regression  setting.  The  critical  values 
may  depend  on  the  design  configuration,  namely  on  both  the  sample  size  and 
on  the  x-values.  We  first  evaluate  for  fixed  sample  size  the  effect  of  the 
design  on  the  critical  values.  Obviously  it  would  be  highly  desirable  If 
the  critical  values  are  relatively  constant  and  thus  do  not  have  to  be 
regenerated  Individually  for  each  design.  The  effect  of  the  design  on  the 
null  distribution  of  F^m  and  F^  was  evaluated  for  the  four  designs  described 
In  Section  2.  The  empirical  null  distribution  of  F^m  and  F2m  was  generated 
using  3000  samples  for  each  design.  The  upper  10Z,  5Z,  2.5Z  and  1%  of  the 
cumulative  distributions  of  F1(J)  and  F2m  can  be  found  In  Table  3.  The  table 
also  records  the  values  Fj^  _K  [<*/(£)],  K  «  1,2,  derived  using  the  Bonferonnl 
inequality. 

First  note  that  the  four  designs  yielded  very  similar  empirical  critical 
values  for  both  Flm  and  F2n). 

The  empirical  percentiles  of  F^m  are  very  similar  to  the  100(o/n) 
percentiles  of  the  F^  _i  distribution.  In  most  cases,  the  empirical 
percentiles  slightly  exceeded  the  theoretical  upper  bounds  Fi1n_p_i(a/n) 
based  on  the  Bonferonnl  Inequality.  The  Bonferonnl  critical  values  for 
Fim  are  clear1y  very  tight  upper  bounds  and  their  use  Is  recommended  for  all 
regression  designs.  This  concurs  with  Draper  and  John's  procedure  for 
the  detection  of  a  single  outlier  In  a  two-way  ANOVA  design. 
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Table  3.  Empirical  Percentiles  of  F^m  and  F2ffl  for,  the  Four  Designs  Presented 
In  Table  2.  The  Values  Listed  In  Parentheses  are  the  Relative 
Frequencies  by  which  F^  Exceeded  F^  n_p-K  (<*/(£))»  K  *  1,2. 


K  *  1 


a 

DESIGN  1 

DESIGN  2 

DESIGN  3 

DESIGN  4 

FK.n-p-Kfa'(K» 

.10 

10. 33( . 100) 

10 . 42( . 104) 

10. 35 (.101) 

10. 59 ( .110) 

10.334 

.05 

1 2. 48( . 053) 

12.47( .052) 

12.59(.056) 

12.42( .051) 

12.257 

.025 

14.59( .027) 

14 .59( .027) 

14.59( .027) 

14.25( .025) 

14.315 

.01 

16.97(.009) 

17.32( .011 ) 

17.76(.012) 

18.06 ( .012) 

17.246 

K  =  2 


.10 

9.90( .043) 

10.05( .049) 

10.17( .043) 

10. 16( .050) 

11.943 

.05 

11 . 51 ( .023) 

11. 86 ( .025) 

11. 66( .022) 

1 1 . 96 ( .026) 

13.435 

.025 

1 3. 19 ( .012) 

13.45( .012) 

13.08( .012) 

13.46( .013) 

15.025 

.01 

15. 33( .005) 

15.57( .006) 

15.48( .006) 

16. 11 (.006) 

17.285 
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The  comparison  of  the  percentiles  of  the  F^  distribution  for  various 
designs  Is  more  relevant.  The  95th  percentile  of  the  null  empirical  dis¬ 
tribution  of  in  the  original  design  (11.507)  corresponds  to  the  95.7, 
95.4  and  95. 8-th  percentiles  of  the  empirical  distributions  of  F^m  generated 
under  the  designs  2,  3  and  4.  The  differences  are  small  at  other  percentiles 


as  well.  Thus  It  appears  that  the  critical  values  generated  under  the 
original  design  (either  by  simulation  or  computed  from  an  approximation 
formula)  can  be  safely  used  with  other  designs,  a  reassuring  result.  In  the 
power  study  described  In  the  next  section,  we  use  the  critical  values  obtained 
from  the  null  distribution  of  Fgu,  under  the  original  design.  Note  that  for 
a  *  0.1,  0.05,  0.025  and  0.01,  F^-  2  [o/(J)]  corresponds  to  approximately 
half  of  the  nominal  a. 


Using  the  x-values  of  the  original  design  (Table  1),  3000  samples  were 
now  generated  to  provide  the  null  distribution  of  for  1,  2,  3,  4,  5, 
and  6  y-values  at  each  of  the  26  x-values.  This  enabled  us  to  list  the 
empirical  percentiles  of  F^m  for  various  n's  and  a's  and  to  develop  an 
approximating  formula  for  the  critical  values.  Following  John  and  Draper 
(1978),  we  do  that  by  estimating  Andrews'  parameter  m  in  (3.2)  with  the 
formula  set  at  prespecified  probability  values.  Specifically  for  each  value 
of  n  =  26 j,  j  =  1 ....  ,6,  let  F^jp  be  the  upper  lOOt-th  percentile  of  the 
empirical  cumulative  distribution  of  Fg,,,  and  let  «  l-t/3000,  t  *  1,...,3000. 
Denote  the  resulting  solution  of  (3.2)  by  m^.  We  thus  obtained  3000  *  6 
triplets  (mt,<*t,n).  The  extreme  1%,  in  both  tails  of  the  six  empirical 


distributions  of  were  deleted  due  to  their  higher  Instability.  The 

equation  m/n  *  a  +  bn  was  fitted  to  the  remaining  17>640  points  by 

2 

ordinary  least  squares  giving  m/n  *  0.59  -  0.0015n  with  R  =  0.26.  The 

fit  Is  obviously  not  satisfactory.  The  addition  of  an  a-term  to  the  re- 

2 

gresslon  equation  yielded  m/n  *  0.778  -  0.378n  -  0.0015n  with  R  *  0.94,  a 
definite  Improvement.  (Obviously,  since  the  vector  of  a's  Is  orthogonal 
to  the  vector  of  n's  the  coefflcent  of  n  remains  unchanged  when  the  a-term 
Is  added  to  the  equation.)  For  a  >  0.05  the  relationship  is  m  *  0.76n(l-0.0015n) 
or  roughly  m  «  |n(l  - 

We  thus  conclude  that.  In  a  straight  line  regression,  the  critical  values 
for  F2m  at  a  *  0.05  can  be  obtained  by  substituting  in  (3.2)  m  »  |n(l  -  2000^ 
Note  the  great  similarity  to  the  John  and  Draper  (1978)  approximation  for 
the  two-way  table  Investigation  m  ■  |n(l  -  y^gg-).  H°wever,  If  the  critical 
values  are  sought  at  other  a's  the  effect  of  a  on  the  value  of  m  cannot 
be  Ignored  and  the  critical  values  should  be  obtained  by  substituting  In  (3.2) 
the  value  m  *  n(0. 778-0. 378a-0.0015n).  Based  on  the  results  from  the  com¬ 
parisons  of  the  four  designs  and  on  the  similarity  of  the  approximating 
formula  with  the  one  obtained  for  two-wqy  designs,  we  speculate  that  these 
approximations  are  valid  over  a  wide  range  of  designs. 
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4.  POWER  OF  THE  TESTS  BASED  ON  THE  STATISTICS  Flm  AND  F2nj. 

We  now  turn  to  Investigate  the  behaviour  of  the  tests  based  on  Fjm  and 
F2m  ™  t*ie  Presence  one  an<J  two  outliers.  Specifically,  for  each  of  the 

i 

four  designs  and  the  locations  of  the  assumed  outliers  given  in  Table  2, 
we  generated  3000  samples  according  to  model  (1.1).  For  convenience  we  re¬ 
fer  to  a  specific  design  together  with  the  sites  at  which  the  outliers  are 
located  as  a  configuration.  In  all,  we  have  evaluated  15  configurations. 
The  outliers  were  of  size  +3a  and  +5o.  The  empirical  power  is  defined  as 
the  percent  out  of  3000  samples  that  the  statistic  F^  exceeds  its  a  =  0.05 
critical  value.  Note  that  the  observations )  thus  identified  as  outlier(s) 
may  or  may  not  be  the  actual  outller(s)  (although  typically  they  would  be). 

4.1  Simulation  results.  K  «  1. 

In  the  case  of  a  single  outlier  the  simulation  study  addresses  the 
following  issues: 

(a)  How  the  power  of  the  test  based  on  F1(J)  varies  with  the  leverage 

2 

at  the  outlier's  site  and  with  the  overall  measure  of  robustness  Eh^. 

(b)  How  the  power  of  the  test  based  on  F2m  behaves  when  only  a  single 
outlier  is  present. 

The  upper  panel  of  Table  4  compares  two  configurations  with  almost 
equal  h- values  at  the  outlier's  site  but  with  very  different  Eh^'s.  We 
observe  that,  for  equal  outlier's  size,  there  Is  little  variation  In  power 
between  the  two  configurations. 


Table  4.  Empirical  Power  Results  (xlOO)  with  KO 


DESIGN  2; 

Outlier  at  site  14 

DESIGN  4; 

Outlier  at  site  21 

e 

Ac2  • 

Flm 

F2m 

Aa2 

Fin 

F2m 

3c 

4.31 

32.7 

30.5 

4.32 

34.5 

31.0 

5o 

11.99 

90.1 

86.6 

12.01 

89.8 

85.8 

-3c 

4.31 

34.2 

32.4 

4.32 

32.5 

28.6 

-5o 

11.99 

90.1 

86.4 

12.01 

90.7 

86.6 

h  - 

value 

.041 

.039 

at  outlier's  site 

.169 

.441 

DESIGN 

1 ;  Outlier  at  site  26 

DESIGN  4; 

Outlier  at  site  25 

e 

XoZ 

Flm 

F2» 

Xa2 

Fi» 

F2. 

3a 

3.08 

23.7 

21.8 

3.40 

24.9 

22.6 

5 a 

8.55 

73.4 

68.3 

9.45 

78.9 

73.5 

-3a 

3.08 

22.3 

19.6 

3.40 

24.7 

22.6 

-5a 

8.55 

73.9 

69.3 

9.40 

78.3 

73.6 

h  - 

value 

.316 

.244 

at  outlier's 

site 

thii 

.231 

.441 
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The  lower  panel  of  Table  4  compares  two  configurations  with  different 
h-values  at  the  outlier's  site  and  different  (and  reversed  in  direction) 

Stir's.  Powers  are  lower  than  above  (because  of  higher  h-values)  and  in¬ 
crease  as  expected  with  decreasing  h  values,  and  this  effect  Is  not  re- 
versed  by  a  lower  Zh^  value  In  the  first  of  the  two  configurations. 

The  Implication  from  the  whole  of  Table  4  is  that  power  does  not  change 
2 

with  Th^,  but  increases  as  the  h-value  of  the  outlier  site  decreases. 

In  all  four  configurations  In  Table  4,  the  use  of  the  test  based  on 
Fgm  (Instead  of  F^)  resulted  In  a  decrease  of  power. 

4.2  Simulation  results,  K  ■  2. 

— ■  1  ■  ■  ■  r  .i  ■ 

P 

In  configurations  with  two  outliers,  let  0^  and  02  be  their  respective 

« 

sizes,  so  that  e  ■  (e-|,e2)'.  The  following  Issues  are  addressed  in  the 
Simulation  study  with  K  *  2: 

(a)  How  the  power  varies  with  the  (H22 **2^  which  are  related  t0 

the  outliers'  locations  and  with  the  overall  measure  of  robustness  Zh^. 

(b)  How  the  power  varies  with  the  relative  position  of  the  outliers. 

(c)  How  the  power  of  the  test  based  on  F^m  Is  affected  by  the  presence  and 
the  position  of  the  second  outlier.  We  note  that  the  performance  of 
Flm  when  one  than  a  single  outlier  Is  present  may  be  of  Interest  In 
the  cases  when  one  does  not  recognize  the  presence  of  a  second  outlier 
and/or  when  the  test  Is  performed  in  a  "stepwise"  fashion  (see,  e.g., 

Anscombe,  1960).  • 
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In  Table  5  we  compare  the  empirical  power  6f  the  tests  in  two  con¬ 
figurations.  The  first  configuration  has  both  a  small  va^ue  the 

2 

outliers'  locations  and  a  smaller  Eh^.  From  Table  5  we  observe: 

(a)  In  general,  the  power  of  the  tests  increases  monotonically  with  Xg. 

(Note  that,  unlike  Xj,  X2  is  not  a  monotonic  function  of  |I-H22|.) 

(b)  Neither  the  relative  position  of  the  two  outliers  (measured  by  h^)  no** 

O 

the  value  of  Zh^  appear  to  affect  the  power  of  the  test  based  on  F2j|). 

(c)  When  the  two  outliers  have  equal  signs,  the  test  based  on  has  a 
smaller  power  than  the  one  obtained  by  Fgm. 

(d)  The  magnitude  of  the  loss  of  power  due  to  the  use  of  Fjm  (instead  of 

F2||))  depends  on  h12-  For  a  large  h12  value  and  sign  (0j)  *  sign  (02),  the 
decrease  in  power  may  be  considerable  and  may  reverse  the  monotonicity 
with  Xg.  Table  6  presents  several  additional  configurations  from  all 
four  designs  to  further  Illustrate  this  point. 

Tables  7  and  8  present  some  comparisons  of  power  achieved  by  F^  when. 

In  the  same  design  (a)  K  *  1,  versus  (b)  K  *  2  with  |0.||  *  |02|.  The  assumed 
outlier  in  (a)  is  Included  in  the  pair  of  outliers  in  (b).  The  configurations 
in  Table  7  have  similar  and  relatively  small  h-|2-values.  We  observe  that, 
when  the  h-value  of  the  second  outlier  is  small  (Design  2,  site  15)  and  sign 
(0j)  *  sign  (02),the  power  obtained  when  K  -  2  is  smaller  than  when  K  *  1. 

When  the  two  outliers  have  opposite  signs,  the  two  configurations  yield 
similar  power.  When  the  h-value  of  the  second  outlier  is  large  (Design  4, 
site  25)  the  power  when  K  *  2  is  in  general  smaller  than  when  K  *  1.  Again, 
the  power  decreases  when  the  two  outliers  have  the  same  signs. 

In  Table  8  we  investigate  the  effect  of  the  relative  position  of  the 
outliers.  When  the  two  outliers  are  at  the  extremes  of  the  x-range  (h^2*0) 
the  power  of  the  Fjm  test  is  generally  smaller  than  when  K  »  1  and  is  not 


assesses.****. 


Table  5.  Empirical  Power  Results  (xlOO)  when  K 


DESIGN  1;  Outliers  at  sites  (25,26)  DESIGN  4;  Outliers  at  sites  (21,  25) 


(3o,3a) 
(3a  ,5 a) 
(3a,-3a) 
(3a,-5a) 
(5a, 3a) 
(5a, 5a) 
(5a, -3a) 
(5a, -5a) 


^lm 

F2n, 

5.389 

11.8 

20.3 

9.751 

34.9 

50.5 

8.723 

52.8 

50.8 

15.306 

86.5 

84.9 

11.350 

59.3 

64.4 

14.971 

30.9 

76.6 

16.906 

92.1 

90.5 

24.230 

97.8 

98.2 

at  outliers 


\f2 

Fi„ 

F2m 

27.5 

34.5 

1 

63.2 

74.6 

I- ! ' 

32.0 

39.5 

14.130 

67.1 

75.4 

15.042 

80.9 

83.3 

20.843 

74.2 

93.3 

15.771 

84.6 

86.9 

22.058 

85.0 

94.7 

039  .0241 

j 

r 

024  .244-1 

« 

.725 

.441 

Table  6.  Empirical  Power  (xlOO)  Using  Flm,as  a  Function  of  and  h^. 
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Table  8.  A  Further  Comparison  of  Power  (xlOO)  of  the*  Test  Based  on  F^m  When 
K  *  1  Versus  K  *  2. 


DESIGN  3 

K  *  1 

K  -  2 

K  *  2 

e 

0re2 

at  site  1 

at  sites  (1,2) 

at  sites  (1 ,14) 

3a 

3a, 3a 

31.3 

22.4 

32.3 

3o,-3a 

42.2 

32.5 

5a 

5a, 5a 

88.7 

66.2 

82.3 

5a, -5a 

92.8 

82.6 

-3a 

-3a, -3a 

30.8 

24.6 

31.8 

-3a, 3a 

43.3 

31.8 

-5a 

-5a, -5a 

89.5 

65.6 

83.4 

-5a  ,5a 

93.4 

81.1 

~22 

at  out'iers' 

0.077 

fO.077  0.0771 

fo .077  0  1 

sites 

10.077  0.077J 

L  0  0.077J 

0.154 


affected  by  the  signs  of  the  e's.  However,  when  the  two  outliers  are 
clustered  (h^*  0.077)  and  sign  (0-j)  +  sign  (Qg) >  the  power  is  larger  when  K  =  2 
than  when  K  =1.  The  opposite  Is  true  when  sign  (6^)  »  sign  (e2). 


5.  AN  EXAMPLE 


t 


A  brief  diagnostic  analysis  based  on  the  statistics  Is  now  per¬ 
formed  on  the  data  from  Table  1.  No  attempt  Is  made  to  carry  out  a  complete 

i 

analysis  of  this  set  of  data  here  but  only  to  present  an  application  of  the 
use  of  the  F^  statistics.  For  further  analyses  see  Lalshes  and  Rolfe  (1980) 
and  Fuchs  (1980).  The  fitted  equation  was  y  «  6.38  +  10.59x  with  R*  s  0.89. 

The  plot  of  y  versus  x  Indicates  no  obvious  deviation  from  the  fitted 
model.  When  we  attempt  a  stepwise  deletion  of  outliers  (or,  equivalently, 
assume  at  first  that  no  more  than  one  outlier  Is  present),  case  12  Is  detected 
as  an  outlier.  After  the  deletion  of  case  12,  case  13  Is  detected  next.  No 
further  outliers  were  detected.  We  note  that  cases  12  and  13  correspond  to 
the  two  jaundiced  animals. 

When  a  simultaneous  detection  of  a  pair  of  outliers  was  attempted,  Q2ffl 
selected  the  cases  12  and  13  as  potential  outliers  and, In  the  subsequent 
testing  procedure,  both  were  labelled  as  outliers.  When  K  *  3  was  postulated, 
Qjh,  selected  cases  6,  12  and  13  as  potential  outliers  but  subsequent  analyses 
Identified  only  cases  12  and  13  as  outliers.  Thus  all  tests  detected  both  the 
two  jaundiced  animals,  and  only  these. 

Next  we  performed  the  diagnostic  analysis  on  the  data  from  each  of  the 
three  days  (A,  B  and  C)  separately.  No  outliers  were  detected,  not  even  for 
experiment  A  which  Included  the  two  jaundiced  animals.  The  reason  for 
this  Is  that  the  design  In  Experiment  A  Is  very  "non-robust".  The  two 
jaundiced  cases  happen  to  have  extreme  x- values  (x  ■  10  and  x  «  15,  respectively) 
This  alters  considerably  the  ability  to  detect  them  as  outliers  In  data  set 


A  alone.  When  all  data  are  confclned  however,  three  more  observations  with 
x  -  10  are  recorded  and  the  fact  that  case  12  Is  an  outlier  becomes  obvious, 
which  then  leads  to  the  detection  of  the  second  jaundiced  animal. 
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6.  CONCLUSIONS 

The  value  of  QKm  as  an  "outliers"  statistic  has  been  established  for 
several  years.  In  this  article,  we  examine  the  distributional  properties 
of  the  related  statistic  for  the  case  of  a  straight  line  regression 
model  and  for  K  *  1,2.  We  have  also  extended  previous  Investigations  of 
FKm  carrying  out  power  calculations  for  various  designs. 

We  found  that  a  correct  determination  of  the  K  in  advance  has  con¬ 
siderable  Influence  on  the  power  of  the  test.  When  two  outliers  were  pre¬ 
sent,  the  test  based  on  Flm  performed  more  poorly  than  that  based  on  F2(n. 
loss  In  power  Is  especially  large  when  the  two  outliers  are  close  to  each 
other  with  the  same  sign.  A  decrease  In  power  also  results  from  the  use 
of  F2m  when  only  one  outlier  is  present. 

The  recommendation  mentioned  by  both  Box  and  Draper  (1975)  and  Draper 

and  John  (1980)  that  It  would  seem  desirable  to  choose  the  experimental  de- 
2 

sign  so  that  Ih^  Is  as  small  as  possible  Is  valid  when  one  expects  random 
outliers  at  unknown  positions.  Here,  however,  the  experimental  design 
appears  to  affect  the  power  through  the  leverage  at  the  outliers'  sites. 
The  practical  Implication  is  that,  if  the  experimenter  has  some  prior  know¬ 
ledge  about  the  experimental  sites  which  are  prone  to  outliers  (as  Is  the 
case  In  carcinogenic  studies  at  high  dosages)  it  may  be  wise  to  decrease 
the  h- values  at  those  sites  even  at  the  expense  of  overall  robustness. 

Our  final  comment  concerns  the  formulas  found  for  approximating  the 
generated  percentage  points  of  F2m  for  the  straight  line  situation.  Pre¬ 
viously,  Draper  and  John  (1980)  remarked  that,  for  two  way  tables,  the 


The 
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3 

approximating  formula  m  ■  ^n{l-(K+l)n/1600)  worked  well  for  both  K  *  2 
and  3.  Here,  for  the  case  K  ■  2  and  a  ■  0.05,  a  very  similar  formula  emerged, 
namely  m  *  |n{l-(K+l)n/2000).  This  leads  us  to  speculate  that,  at  least  for 
a  *  0.05,  either  of  these  formulas  (which  differ  very  little  for  moderate  n) 
would  provide  an  adequate  method  for  obtaining  critical  test  values  In  a 
wide  variety  of  design  circumstances.  For  other  a  values,  a  more  general 
formula  Is  offered. 


4 


\ 


i 
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